13 research outputs found

    Lossless seeds for searching short patterns with high error rates

    Get PDF
    International audienceWe address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P , find alllocations in T that differ by at most k errors from P . For that purpose, we propose a filtration algorithm that is based on a novel type of seeds,combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically well-suited for short patterns with a large number of error

    Algorithmique pour la recherche de motifs approchée et application à la recherche de cibles de microARN

    No full text
    Approximate string matching consists in identifying the occurrences of a motif within a text, modulo a given distance. This problem has many applications in bioinformatics for the analysis of biological sequences. For instance, microRNAs are short RNA molecules regulating the expression of genes by specific recognition of their sequence motif on the target gene. Understanding the mode of action of microRNAs requires the ability to identify short motifs, around 21 nucleotides in size, comprising up to 3-4 errors in a text whose size is in the order of 108-109 , representing a genome. In this thesis, Ihave proposed an efficient algorithm for the approximate search of short motifs. This algorithm is based on a new type of seeds containing errors, the 01∗ 0 seeds, and uses a compressed index structure, the FM-index. I have implemented this algorithm in a freely available software, Bwolo. I demonstrate experimentally the advantage of this approach and compare it to the state of the art of existing tools. I also show how Bwolo can be used and have set up an original study on the distribution of potential miRNA target sites in two plant genomes, Arabidopsis thaliana and Arabidopsis lyrata.La recherche de motifs approchĂ©e consiste Ă  identifier les occurrences d’un motif modulo une certaine distance au sein d’un texte. Ce problĂšme trouve de nombreuses applications en bio-informatique pour l’analyse de sĂ©quences biologiques. Par exemple, les microARN sont des petits ARN qui rĂ©gulent l’expression des gĂšnes par reconnaissance d’un motisimilaire. Comprendre le mode d’action des microARN demande de pouvoir localiser de courts motifs, environ 21 nuclĂ©otides, comprenant jusqu’à 3 ou 4 erreurs dans un texte de l’ordre de 108 Ă  109 nuclĂ©otides, reprĂ©sentant un gĂ©nome. Dans cette thĂšse, nous proposons un algorithme efficace pour la recherche de motifs approchĂ©e, qui se base sula dĂ©finition d’un nouveau type de graines avec erreurs, les graines 01∗0, et qui exploite une structure d’index compressĂ©e, le FM-index. Cet algorithme a Ă©tĂ© mis en Ɠuvre dans un logiciel librement disponible, appelĂ© Bwolo. Nous dĂ©montrons expĂ©rimentalemenl’avantage de cette approche en nous comparant Ă  l’état de l’art des outils existants. Nous montrons Ă©galement comment utiliser Bwolo pour mettre en place une analyse originale sur l’étude de la distribution des cibles potentielles de miARN dans deux gĂ©nomes de plantes, Arabidopsis thaliana et Arabidopsis lyrata

    Détection des protéines de surface de Propionibacterium freudenreichii

    No full text
    il s'agit d'un type de produit dont les mĂ©tadonnĂ©es ne correspondent pas aux mĂ©tadonnĂ©es attendues dans les autres types de produit : DISSERTATIONPropionibacterium freudenreichii is a gram-positive bacterium known as probiotic. A probiotic is alive microorganism which, when administered in adequate amounts, confers a health benefit on thehost. The beneficial effects are ensured by different interactions with other microorganisms andhost’s cells, including the immune system. These interactions are performed largely by the surfacecompounds. Among these compounds, it was shown in P. freudenreichii that surface proteins areimportant. An in vivo labeling method of surface proteins using CyDye fluorescent labels isdeveloped here. A Western blot shows that no lysis is observed during the labeling, but is observedduring treatment with lysozyme, treatment needed to extract proteins. Observation of cells by TEMreveals that lysozyme has only destabilized the wall. The spots observed on a 2D gel byfluorescence and visible by staining with Coomassie blue were analyzed by MS-MS. The majorityof spots contains several proteins. The spots contain often at least one surface protein. It will thus benecessary to use more resolving gels and to combine this method with another based on the shavingof surface proteins by trypsinPropionibacterium freudenreichii est une bactĂ©rie Ă  Gram positif dite probiotique. Un probiotiqueest un microorganisme vivant, qui lorsqu’il est administrĂ© en quantitĂ© adĂ©quat, confĂšre un bĂ©nĂ©ficepour la santĂ© de l’hĂŽte. Les actions bĂ©nĂ©fiques sont assurĂ©es par diffĂ©rentes interactions avec lesautres microorganismes et les cellules de l’hĂŽte, dont le systĂšme immunitaire. Ces interactions sontassurĂ©es en grande partie par les composĂ©s de surface. Parmi ces composĂ©s, il a Ă©tĂ© dĂ©montrĂ© queles protĂ©ines de surface chez P. freudenreichii sont importantes. Une mĂ©thode de marquage desprotĂ©ines de surface rĂ©alisĂ© in vivo Ă  l’aide des marqueurs fluorescents CyDye est dĂ©veloppĂ©e ici.Un Western Blot montre qu’aucune lyse n’est observĂ©e durant le marquage, mais elle est observĂ©edurant le traitement au lysozyme, traitement nĂ©cessaire Ă  l’extraction de protĂ©ines. L’observationdes cellules au MET rĂ©vĂšle que le lysozyme a uniquement dĂ©stabilisĂ© la paroi. Les spots observĂ©ssur un gel 2D apparaissant fluorescent et Ă©tant visible grĂące Ă  une coloration au bleu de coomassieont Ă©tĂ© analysĂ© par MS-MS. La majoritĂ© des spots contient plusieurs protĂ©ines, dont bien souvent aumoins une de surface. Il sera donc nĂ©cessaire de rĂ©aliser des gels plus rĂ©solutif et de complĂ©ter cettemĂ©thode avec une autre basĂ©e sur le rasage des protĂ©ines de surface par de la trypsi

    Algorithmique pour la recherche de motifs approchée et application à la recherche de cibles de microARN

    No full text
    Approximate string matching consists in identifying the occurrences of a motif within a text, modulo a given distance. This problem has many applications in bioinformatics for the analysis of biological sequences. For instance, microRNAs are short RNA molecules regulating the expression of genes by specific recognition of their sequence motif on the target gene. Understanding the mode of action of microRNAs requires the ability to identify short motifs, around 21 nucleotides in size, comprising up to 3-4 errors in a text whose size is in the order of 108-109 , representing a genome. In this thesis, Ihave proposed an efficient algorithm for the approximate search of short motifs. This algorithm is based on a new type of seeds containing errors, the 01∗ 0 seeds, and uses a compressed index structure, the FM-index. I have implemented this algorithm in a freely available software, Bwolo. I demonstrate experimentally the advantage of this approach and compare it to the state of the art of existing tools. I also show how Bwolo can be used and have set up an original study on the distribution of potential miRNA target sites in two plant genomes, Arabidopsis thaliana and Arabidopsis lyrata.La recherche de motifs approchĂ©e consiste Ă  identifier les occurrences d’un motif modulo une certaine distance au sein d’un texte. Ce problĂšme trouve de nombreuses applications en bio-informatique pour l’analyse de sĂ©quences biologiques. Par exemple, les microARN sont des petits ARN qui rĂ©gulent l’expression des gĂšnes par reconnaissance d’un motisimilaire. Comprendre le mode d’action des microARN demande de pouvoir localiser de courts motifs, environ 21 nuclĂ©otides, comprenant jusqu’à 3 ou 4 erreurs dans un texte de l’ordre de 108 Ă  109 nuclĂ©otides, reprĂ©sentant un gĂ©nome. Dans cette thĂšse, nous proposons un algorithme efficace pour la recherche de motifs approchĂ©e, qui se base sula dĂ©finition d’un nouveau type de graines avec erreurs, les graines 01∗0, et qui exploite une structure d’index compressĂ©e, le FM-index. Cet algorithme a Ă©tĂ© mis en Ɠuvre dans un logiciel librement disponible, appelĂ© Bwolo. Nous dĂ©montrons expĂ©rimentalemenl’avantage de cette approche en nous comparant Ă  l’état de l’art des outils existants. Nous montrons Ă©galement comment utiliser Bwolo pour mettre en place une analyse originale sur l’étude de la distribution des cibles potentielles de miARN dans deux gĂ©nomes de plantes, Arabidopsis thaliana et Arabidopsis lyrata

    Algorithmic for approximate string matching and application for the search of microRNA targets

    No full text
    La recherche de motifs approchĂ©e consiste Ă  identifier les occurrences d’un motif modulo une certaine distance au sein d’un texte. Ce problĂšme trouve de nombreuses applications en bio-informatique pour l’analyse de sĂ©quences biologiques. Par exemple, les microARN sont des petits ARN qui rĂ©gulent l’expression des gĂšnes par reconnaissance d’un motif similaire. Comprendre le mode d’action des microARN demande de pouvoir localiser de courts motifs, environ 21 nuclĂ©otides, comprenant jusqu’à 3 ou 4 erreurs dans un texte de l’ordre de 108 Ă  109 nuclĂ©otides, reprĂ©sentant un gĂ©nome. Dans cette thĂšse, nous proposons un algorithme efficace pour la recherche de motifs approchĂ©e, qui se base sur la dĂ©finition d’un nouveau type de graines avec erreurs, les graines 01*0, et qui exploite une structure d’index compressĂ©e, le FM-index. Cet algorithme a Ă©tĂ© mis en Ɠuvre dans un logiciel librement disponible, appelĂ© Bwolo. Nous dĂ©montrons expĂ©rimentalement l’avantage de cette approche en nous comparant Ă  l’état de l’art des outils existants. Nous montrons Ă©galement comment utiliser Bwolo pour mettre en place une analyse originale sur l’étude de la distribution des cibles potentielles de miARN dans deux gĂ©nomes de plantes, Arabidopsis thaliana et Arabidopsis lyrata.Approximate string matching consists in identifying the occurrences of a motif within a text, modulo a given distance. This problem has many applications in bioinformatics for the analysis of biological sequences. For instance, microRNAs are short RNA molecules regulating the expression of genes by specific recognition of their sequence motif on the target gene. Understanding the mode of action of microRNAs requires the ability to identify short motifs, around 21 nucleotides in size, comprising up to 3-4 errors in a text whose size is in the order of 108-109 , representing a genome. In this thesis, I have proposed an efficient algorithm for the approximate search of short motifs. This algorithm is based on a new type of seeds containing errors, the 01*0 seeds, and uses a compressed index structure, the FM-index. I have implemented this algorithm in a freely available software, Bwolo. I demonstrate experimentally the advantage of this approach and compare it to the state of the art of existing tools. I also show how Bwolo can be used and have set up an original study on the distribution of potential miRNA target sites in two plant genomes, Arabidopsis thaliana and Arabidopsis lyrata

    Recherche et annotation des structures de CRISPR dans l'ensemble des génomes procaryotes

    Get PDF
    Les structures de CRISPR sont des suites d'unités répétées largement présentes dans les génomes procaryotes mais absentes des génomes eucaryotes. Nous avons déjà réalisé une base de données relationnelle, Crispi, rassemblant l'ensemble des CRISPR, et alimentée en permanence par une mise à jour automatique. Il s'agit ici d'enrichir la base en ajoutant l'annotation des structures présentes dans les éléments CRISPR. L'identification de ces structures palindromiques (tige-boucles) a été réalisée à l'aide d'un modÚle grammatical, en utilisant le logiciel de pattern-matching Logol. La définition du modÚle de tige-boucle adapté aux CRISPRS a constitué un des points sensibles de ces travaux

    Approximate search of short patterns with high error rates using the 01⁎0 lossless seeds

    Get PDF
    International audienceApproximate pattern matching is an important computational problem that has a wide range of applications in computational biology and in information retrieval. However, searching a short pattern in a text with high error rates (10–20%) under the Levenshtein distance is a task for which few efficient solutions exist. Here we address this problem by introducing a new type of seeds: the 01⁎0 seeds. These seeds are made of two exact parts separated by parts with exactly one error. We show that those seeds are lossless, and we apply them to two filtration algorithms for two popular applications, one where a compressed index is built on the text and another one where the patterns are indexed. We also demonstrate experimentally the advantages of our approach compared to alternative methods implementing other types of seeds. This work opens the way to the design of more efficient and more sensitive text algorithms

    The probiotic[i] Propionibacterium freudenreichii[/i] surface proteome

    No full text
    Surface proteins are key actors of the complex interactions between bacteria (pathogens, commensals or symbionts) and their host. In beneficial (probiotic) bacteria, they participate in competition with pathogens, adhesion to the host cells, and immunomodulation. We investigated such proteins in the beneficial bacterium Propionibacterium freudenreichii, consumed both in Swiss-type cheeses and probiotic preparations. P. freudenreichii genome was sequenced and annotated, the localization of the encoded proteins was predicted using SurfG+. A combination of 3 biochemical methods confirmed surface exposure of P. freudenreichii proteins: shedding, shaving and labelling. Shedding consisted in the extraction of cell-wall associated proteins using guanidine, followed by trypsinolysis of the extracted proteins. Shaving consisted in enzymatic hydrolysis of surface protruding proteins which were accessible to trypsin in situ on live bacteria. For labeling, an NHS-ester-cyanine was added to live bacteria in order to label surface proteins, prior to 2-D electrophoresis and detection of fluorescent protein spots. For the 3 methods, the resulting tryptic peptides were identified by NanoLC-MS/MS on a Q-TOF mass spectrometer This combination of methods allowed identification of surface layer type-proteins, lipoproteins, proteins associated to the cell wall, to the membrane, or predicted to be secreted, as well as moonlighting proteins predicted to be cytoplasmic. Some of these proteins are known to participate in adhesion and in the modulation of the immune response by probiotics. This work constitutes a decisive step in the elucidation of P. freudenreichii ability to interact with host cells and in the understanding of protein sorting in this bacterium

    The probiotic[i] Propionibacterium freudenreichii[/i] surface proteome

    No full text
    Surface proteins are key actors of the complex interactions between bacteria (pathogens, commensals or symbionts) and their host. In beneficial (probiotic) bacteria, they participate in competition with pathogens, adhesion to the host cells, and immunomodulation. We investigated such proteins in the beneficial bacterium Propionibacterium freudenreichii, consumed both in Swiss-type cheeses and probiotic preparations. P. freudenreichii genome was sequenced and annotated, the localization of the encoded proteins was predicted using SurfG+. A combination of 3 biochemical methods confirmed surface exposure of P. freudenreichii proteins: shedding, shaving and labelling. Shedding consisted in the extraction of cell-wall associated proteins using guanidine, followed by trypsinolysis of the extracted proteins. Shaving consisted in enzymatic hydrolysis of surface protruding proteins which were accessible to trypsin in situ on live bacteria. For labeling, an NHS-ester-cyanine was added to live bacteria in order to label surface proteins, prior to 2-D electrophoresis and detection of fluorescent protein spots. For the 3 methods, the resulting tryptic peptides were identified by NanoLC-MS/MS on a Q-TOF mass spectrometer This combination of methods allowed identification of surface layer type-proteins, lipoproteins, proteins associated to the cell wall, to the membrane, or predicted to be secreted, as well as moonlighting proteins predicted to be cytoplasmic. Some of these proteins are known to participate in adhesion and in the modulation of the immune response by probiotics. This work constitutes a decisive step in the elucidation of P. freudenreichii ability to interact with host cells and in the understanding of protein sorting in this bacterium

    Surface proteins of [i]Propionibacterium freudenreichii[/i] are involved in its anti-inflammatory properties.

    No full text
    Propionibacterium freudenreichii is a food grade bacterium consumed both in cheeses and in probiotic preparations. Its promising immunomodulatory potential largely relies on the presence of specific and strain-dependent surface compounds. The CIRM BIA 129 strain was selected for immunomodulatory properties, its surfaceome deciphered using an integrative approach. Surface Layer Associated Proteins (SLAPs) are involved in the anti-inflammatory properties
    corecore